Target-Aware Lattice Rescoring for Dialect Recognition
نویسندگان
چکیده
We observed that human listeners distinguish one dialect from another by paying special attention to some particular phonetic and/or phonotactic patterns. Motivated by this observation, we propose a technique that emulates this process. We explore a target-aware lattice rescoring (TALR) process that revises the n-gram statistics in a lattice with target dialect information. We then derive n-gram statistics as the phonotactic features from the lattice and develop a system under the vector space modeling framework. The experiment results show that the proposed technique consistently improves dialect recognition performance on 30-second test utterances. We achieved equal error rates (EERs) of 4.57% and 13.28% with 3-gram statistics for Chinese and English dialect recognition in 2007 NIST Language Recognition Evaluation 30-second closed test sets.
منابع مشابه
Rescoring-Aware Beam Search for Reduced Search Errors in Contextual Automatic Speech Recognition
Using context in automatic speech recognition allows the recognition system to dynamically task-adapt and bring gains to a broad variety of use-cases. An important mechanism of contextinclusion is on-the-fly rescoring of hypotheses with contextual language model content available only in real-time. In systems where rescoring occurs on the lattice during its construction as part of beam search d...
متن کاملOn-the-fly lattice rescoring for real-time automatic speech recognition
This paper presents a method for rescoring the speech recognition lattices on-the-fly to increase the word accuracy while preserving low latency of a real-time speech recognition system. In large vocabulary speech recognition systems, pruned and/or lower order n-gram language models are often used in the first-pass of the speech decoder due to the computational complexity. The output word latti...
متن کاملTone information as a confidence measure for improving Cantonese LVCSR
Cantonese, a syllabically paced, southern Chinese dialect, is also a tonal language. A Cantonese syllable can have up to 9 different tone patterns which are lexically important. In this paper after reviewing major approaches to incorporating tone information into a large vocabulary continuous speech recognition (LVCSR) system, we propose two schemes to employ the tone information as a confidenc...
متن کاملFuzzy class rescoring: a part-of-speech language model
Current speech recognition systems usually use word-based trigram language models. More elaborate models are applied to word lattices or N best lists in a rescoring pass following the acoustic decoding process. In this paper we consider techniques for dealing with class-based language models in the lattice rescoring framework of our JANUS large vocabulary speech recognizer. We demonstrate how t...
متن کاملPhonetic recognition using a statistical hidden dynamic model of speech
This paper presents new results on evaluation of the statistical coarticulatory hidden dynamic model (HDM) on the TIMIT phone recognition task. We train both the HDM and baseline HMM on the complete TIMIT training data set and evaluate both systems using the N-best rescoring algorithm on the TIMIT test data set and the dr8 dialect subset. We show that with the inclusion of the reference transcr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011